20 research outputs found
BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Advances in sequencing techniques have led to exponential growth in
biological data, demanding the development of large-scale bioinformatics
experiments. Because these experiments are computation- and data-intensive,
they require high-performance computing (HPC) techniques and can benefit from
specialized technologies such as Scientific Workflow Management Systems (SWfMS)
and databases. In this work, we present BioWorkbench, a framework for managing
and analyzing bioinformatics experiments. This framework automatically collects
provenance data, including both performance data from workflow execution and
data from the scientific domain of the workflow application. Provenance data
can be analyzed through a web application that abstracts a set of queries to
the provenance database, simplifying access to provenance information. We
evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree
assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a
RASopathy analysis workflow. We analyze each workflow from both computational
and scientific domain perspectives, by using queries to a provenance and
annotation database. Some of these queries are available as a pre-built feature
of the BioWorkbench web application. Through the provenance data, we show that
the framework is scalable and achieves high-performance, reducing up to 98% of
the case studies execution time. We also show how the application of machine
learning techniques can enrich the analysis process
Phylogenomics-Based Reconstruction of Protozoan Species Tree
Full open access to this and thousands of other papers a
Implementación de una metodología no invasiva in situ para la evaluación de las caraterísticas de hidratación del lápiz labial
El presente trabajo es un estudio experimental en el cual se tomaron como unidades de análisis a 101 voluntarias pertenecientes a la base de datos del laboratorio de demostración de eficacia y Análisis sensorial de Ebel Techriologícal Institute, las cuales cumplieron con los criterios de inclusión que se pre-determinaron.
La muestra estuvo constituida por 4 lápices labiales con principios activos diferentes a los que se les determinó su eficacia cosmética, por la medida de los indicadores biológicos: hidratación, pérdida de agua trans epidermal y pH, medidos en el labio inferior de las voluntarias.
Se trabajó con 5 grupos de voluntarias: grupo experimental A, B, Cl D quienes recibieron la aplicación de los diferentes tipos de labiales y un grupo control A todos los grupos se les realizaran mediciones, a tiempos de control basal, 30 minutos (inmediato), 2 y 4 semana.
el efecto hidratante con los labiales A, B, C, y 1) para el control de 30 minutos fue altamente significativo; el labial CC proporcionó el mismo nivel de hidratación a los labios hasta el final de! tratamiento.
La disminución de la pérdida de agua trans epidermal de los labios, con lías labiales A. B, C, y D fue altamente significativa hasta el final del tratamiento.
La medición del. P1111 de los labios no varió significativamente hasta el final del tratamiento.
El nivel de hidratación del labial C. fuel diferente y mayor que los demás labiales.
El labial C presentó una mejor funcionalidad cosmética basado en! ;a medición de sus atributos por los indicadores biológicos. El principio activo lipexel es el responsable de la eficacia de los atributos cosméticos atribuidosThe present work is an experimental study witch took as analysis units 101 voluntars registered in the database of the laboratory of demonstration of effectiveness and sensorial analysis of ebel technological institute, which fulfilled the inclusion approaches that were pre determined.
The samples were effectiveness determined , by the measure of the biological indicators: hydrate , loss of water trans- epidermal and ph, measured in the inferior lip of the voluntars.
Five group of voluntars were used A,,B,C,D who received the application of the different types of lipstick and a control group. All groups were mensurations, at times of basal control 30 minutes (immediate), 2º and 4º week.
Moisturizing effect with the lipsticks a,b,c and d for the 30 minutes control wasn’t highly significant , the lipstick c keep the same hydratation level to the lips until the end of the treatment.
The decrease of the loss of water trans_ epidermal of the lipsticks a,b,c and d showed a low level of water tras_ epidermal at final of the tratment.
The mensuration of the ph of the lips didn’t vary significantly, until the end of the treatment
The level of hydratation of the lipstick, c was different and higher that the others
The lipstick c presented a better cosmetic functionality based on the mensuration of its attributes by the biological indicators .the active principle lipexel is the responsible for the effectiveness of the cosmetic attributesTesi
Usando uma abordagem filogenômica para o estudo dos protozoários
Submitted by Tatiana Silva ([email protected]) on 2012-12-27T17:46:12Z
No. of bitstreams: 1
kary_a_c_s_ocana_ioc_bcm_0010_2010.pdf: 8676624 bytes, checksum: 38b44fc6cba3d4b68c651cbcf281a044 (MD5)Made available in DSpace on 2012-12-27T17:46:12Z (GMT). No. of bitstreams: 1
kary_a_c_s_ocana_ioc_bcm_0010_2010.pdf: 8676624 bytes, checksum: 38b44fc6cba3d4b68c651cbcf281a044 (MD5)
Previous issue date: 2010Fundação Oswaldo Cruz.Instituto Oswaldo Cruz. Rio de janeiro, RJ, BrasilA reconstrução da história evolutiva, assim como o estabelecimento de hipóteses que
demonstrem as relações filogenéticas dos protozoários bem como dos genes codificados pelos
Elementos Genéticos Móveis (EGM) requerem o uso de várias abordagens e ferramentas, as
quais não se encontram disponíveis de maneira integrada nem de maneira amigável. Diferentes
abordagens filogenéticas, filogenômicas e evolutivas são necessárias para a inferência da
filogenia de espécies e o estudo de genes pouco conservados como a transcriptase reversa, o gene
mais representativo da classe I dos EGM, os retrotransposons. Os principais algoritmos
filogenéticos e os programas que os executam têm sido unificados num único sistema: ARPA,
escrito na linguagem de programação PYTHON. O sistema ARPA e a interface webestão
hospedados na FIOCRUZ e estão disponíveis no endereço http://arpa.biowebdb.org. Eles estão
sendo integrados ao sistema de banco de dados ProtozoaDB (http://protozoadb.biowebdb.org) e
ao sistema de anotação semi-automática Stingray (http://stingray.biowebdb.org/). Uma
abordagem baseada nos fundamentos da filogenômica eevolução foi utilizada para desenvolver
cinco objetivos: (i) analisar e inferir a filogeniados genes relacionados à resistência de drogas em
protozoários, (ii) reconstruir a árvore de espéciesde protozoários, (iii) realizar estudos de
filogenômica dos EGM em protozoários, (iv) inferir a filogenia da telomerase e dos elementos de
retrotransposição em Tri-tryps e (v) adaptar e ampliar o esquema Phylo ao banco de dados GUS
para o armazenamento da informação filogenética.
Os principais resultados obtidos para cada objetivosão: (i) As inferências filogenéticas
dos genes AQP, hsp70, GP63, TRYR e MRPA relacionados à resistência a drogas em
protozoários demonstrou a viabilidade das execuçõesdo sistema ARPA; (ii) a árvore de espécies
de protozoários usando a abordagem da supermatriz provou ser confiável, e o teste PTP e a
estatística G1 demonstraram que os dados moleculares deste estudo possuem sinal filogenético;
(iii) o RAXML foi o programa mais consistente ao lidar com os diferentes níveis de
polimorfismos destes genes, a detecção in silicoda seleção positiva destes genes foi detectada nas
análises pareadas dos modelos M1-M2 e M7-M8, porém o par M0-M3 indicou uma alta
variabilidade da razão ωentre os sítios; (iv) foi observada a monofilia para a telomerase a que
está mais relacionada à transcriptase reversa dos retrotransposons não-LTR; (iv) um novo
esquema Phylo foi concebido e incorporado no GUS 3.5 estendendo-o a fim de armazenar os
dados obtidos de inferências filogenéticas.
As principais conclusões são: (i) O sistema ARPA é uma alternativa viável, eficiente,
fácil e de tempo reduzido para as análises filogenômicas. O RAXML foi considerado o programa
mais consistente e foi observado que as árvores construídas usando as sequências inteiras e/ou as
trimadas com o TRIMAL apresentaram os melhores resultados. A abordagem da supermatriz
apresentou melhores resultados do que a superárvore; (ii) as relações entre os grupos de
protozoários estão de acordo com estudos anterioresda literatura, os quais determinaram também
uma monofilia para os protozoários. A inclusão de mais dados/genes é necessária para obter uma
árvore robusta; (iii) foram reconstruídas as árvores dos genes dos EGM e inferida a filogenia para
cada um deles. O modelo M3 indicou uma alta variabilidade da razão ωentre os sítios e os
modelos M7 e M8 indicaram a presença de seleção positiva para todos os genes dos EGM; (iv) a
telomerase formou um grupo monofilético mais relacionado à transcriptase reversa dos
retrotransposons não-LTR; (v) o esquema Phylo armazena os dados obtidos de experiências
filogenéticas, mantendo as relações de herança filogenética entre cada um dos táxons, o que
permite realizar consultas usando as informações dos ramos, dos nós e táxons da árvore.The reconstruction of the evolutionary history, as well as the establishment of the
hypotheses that demonstrate the phylogenetic relationships of the genes encoded by Mobile
Genetic Elements (MGEs) require the use of various tools and approaches, which are not
available in a friendly or integrated interface. Different phylogenetics, phylogenomics and
evolutionary approaches are necessary for the inference of the species phylogeny. These same
approaches are required on the study of less conserved genes as the reverse transcriptase that is
the most representative gene of the class I of the MGEs, the retrotransposons. The main
phylogenetic algorithms and programs developed by our group have been unified into a single
system - the ARPA - written in the programming language PYTHON. The ARPA system and the
web interface are hosted at FIOCRUZ and are available at http://arpa.biowebdb.org. They are
currently being integrated to the database system ProtozoaDB (http://protozoadb.biowebdb.org)
and to the semi-automatic annotation system Stingray (http://stinngray.biowebdb.org/). An
approach based on the fundamentals of evolution andphylogenomics has been applied to achieve
five different objectives: (i) to analyze and to infer the phylogeny to the genes related to drug
resistance in protozoan genomes, (ii) to reconstruct a protozoan species tree, (iii) to conduct
phylogenomic studies of MGEs in Protozoa, (iv) to infer phylogeny from the telomerase and the
retrotransposable elements in Tri-Tryps and (v) to adapt and to extend the schema Phylo to the
GUS database, for storing phylogenetic informations.
The results obtained for topics were: (i) The construction of the phylogenetic trees of the
genes, AQP, hsp70, GP63, TRYR and MRPA which are related to drug resistance in protozoan
demonstrated the viability of the executions of theARPA system. (ii) The protozoan species tree
using the supermatrix approach proved to be reliable. The PTP Test and the Statistical G1
demonstrated that the molecular data of this study have phylogenetic signal. (iii) The PAUP-AV
was shown to be the most consistent program and thePHYML was the least to deal with different
levels of polymorphisms of these genes. The in silicodetection of the positive selection in MGEs
genes in Protozoa was detected in the paired analysis of the models M1-M2 and M7-M8, but the
pair M0-M3 indicated a high variability of the ratio ωbetween the sites. (iv) It was found that a
monophyly is present for the telomerase, which was the most closely related to the transcriptase
of the non-LTR retrotransposons. (v) A new Phylo schema was designed and incorporated into
the GUS 3.5 extending its service to store the dataobtained from phylogenetic experiments.
As conclusions: (i) The ARPA system is a viable, efficient, easy and reduced time
alternative for phylogenomic analysis. The RAXML was considered the most consistent program
and was observed that the trees constructed using the entire and/or the trimmed sequences with
TRIMAL showed the best results. The supermatrix approach showed better results than the
supertree. (ii) The relationships between protozoangroups are in agreement with previous
studies, which also determined a monophyly for protozoan. The inclusion of more data/genes is
required to obtain a consistent tree. (iii) In the trees of the EGM, the PAUP-AV was the most
consistent and the PHYML the least to deal with different levels of polymorphisms of these
genes. The model M3 showed a high variability of ωratio among sites and the models M7 and
M8 indicated the presence of positive selection forall genes of EGM. (iv) The telomerase formed
a monophyletic group more related to the reverse transcriptase of the non-LTR retrotransposons.
(v) The scheme Phylo stores the data obtained from phylogenetic experiences, keeping the
inheritance of phylogenetic relationships between each of the taxa, which can perform queries using information from the branches, nodes and taxaof the tree
Analyzing provenance across heterogeneous provenance graphs
Provenance generated by different workflow systems is generally expressed using different formats. This is not an issue when scientists analyze provenance graphs in isolation, or when they use the same workflow system. However, when analyzing heterogeneous provenance graphs from multiple systems poses a challenge. To address this problem we adopt ProvONE as an integration model, and show how different provenance databases can be converted to a global ProvONE schema. Scientists can then query this integrated database, exploring and linking provenance across several different workflows that may represent different implementations of the same experiment. To illustrate the feasibility of our approach, we developed conceptual mappings between the provenance databases of two workflow systems (e-Science Central and SciCumulus). We provide cartridges that implement these mappings and generate an integrated provenance database expressed as Prolog facts. To demonstrate its usage, we have developed Prolog rules that enable scientists to query the integrated database.</p
Analyzing provenance across heterogeneous provenance graphs
Provenance generated by different workflow systems is generally expressed using different formats. This is not an issue when scientists analyze provenance graphs in isolation, or when they use the same workflow system. However, when analyzing heterogeneous provenance graphs from multiple systems poses a challenge. To address this problem we adopt ProvONE as an integration model, and show how different provenance databases can be converted to a global ProvONE schema. Scientists can then query this integrated database, exploring and linking provenance across several different workflows that may represent different implementations of the same experiment. To illustrate the feasibility of our approach, we developed conceptual mappings between the provenance databases of two workflow systems (e-Science Central and SciCumulus). We provide cartridges that implement these mappings and generate an integrated provenance database expressed as Prolog facts. To demonstrate its usage, we have developed Prolog rules that enable scientists to query the integrated database.</p
2014 IEEE 28th International Parallel & Distributed Processing Symposium Workshops Exploring Large Scale Receptor-Ligand Pairs in Molecular Docking Workflows in HPC Clouds
Abstract — Computer-aided drug design techniques are important assets in pharmaceutical industry because of their support for research and development of new drugs. Molecular docking (MD) predicts specific compound’s binding modes within the active site of target proteins. Since MD is a timeconsuming process, existing approaches reduce the number of receptors or ligands in docking by evaluating only small sets of compounds. This restriction in the search space reduces the chances to uniformly cover the diverse space of compounds and misses opportunities to recognize whether new drugs can be identified. Another difficulty with large-scale is analyzing the results, e.g. browsing all directories manually to find which pairs were docked successfully. To address these issues we explored the potential of data provenance analysis and parallel processing of SciCumulus, a cloud Scientific Workflow Management System. We present SciDock, a molecular docking-based virtual screening workflow and evaluate its execution using 10,000 receptor-ligand pairs related to proteases enzymes of protozoan genomes. The overall performance of SciDock using 32 cores, in cloud virtual machines, reaches improvements up to 95.4 % when running SciDock with AutoDock and 96.1 % when running SciDock with Vina. We show how data provenance improved the result analysis and how it may indicate potential proteases drug targets for protozoan treatments. Keywords-component; workflow; cloud; drug discovery I
Deep learning from phylogenies to understand the dynamics of epidemics
International audienc
Capturing and Querying Workflow Runtime Provenance with PROV: a Practical Approach
Scientific workflows are commonly used to model and execute large-scale scientific experiments. They represent key resources for scientists and are enacted and managed by Scientific Workflow Management Systems (SWfMS). Each SWfMS has its particular approach to execute workflows and to capture and manage their provenance data. Due to the large scale of experiments, it may be unviable to analyze provenance data only after the end of the execution. A single experiment may demand weeks to run, even in high performance computing environments. Thus scientists need to monitor the experiment during its execution, and this can be done through provenance data. Runtime provenance analysis allows for scientists to monitor workflow execution and to take actions before the end of it (i.e. workflow steering). This provenance data can also be used to fine-tune the parallel execution of the workflow dynamically. We use the PROV data model as a basic framework for modeling and providing runtime provenance as a database that can be queried even during the execution. This database is agnostic of SWfMS and workflow engine. We show the benefits of representing and sharing runtime provenance data for improving the experiment management as well as the analysis of the scientific data